Statistical Confidence for Variable Selection in QSAR Models via Monte Carlo Cross-Validation

نویسندگان

  • Dmitry A. Konovalov
  • Nigel Sim
  • Eric Deconinck
  • Yvan Vander Heyden
  • Danny Coomans
چکیده

A new variable selection wrapper method named the Monte Carlo variable selection (MCVS) method was developed utilizing the framework of the Monte Carlo cross-validation (MCCV) approach. The MCVS method reports the variable selection results in the most conventional and common measure of statistical hypothesis testing, the P-values, thus allowing for a clear and simple statistical interpretation of the results. The MCVS method is equally applicable to the multiple-linear-regression (MLR)-based or non-MLR-based quantitative structure-activity relationship (QSAR) models. The method was applied to blood-brain barrier (BBB) permeation and human intestinal absorption (HIA) QSAR problems using MLR to demonstrate the workings of the new approach. Starting from more than 1600 molecular descriptors, only two (TPSA(NO) and ALOGP) yielded acceptably low P-values for the BBB and HIA problems, respectively. The new method has been implemented in the QSAR-BENCH v2 program, which is freely available (including its Java source code) from www.dmitrykonovalov.org for academic use.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WilcoxCV: an R package for fast variable selection in cross-validation

UNLABELLED In the last few years, numerous methods have been proposed for microarray-based class prediction. Although many of them have been designed especially for the case n << p (much more variables than observations), preliminary variable selection is almost always necessary when the number of genes reaches several tens of thousands, as usual in recent data sets. In the two-class setting, t...

متن کامل

Variational Bayesian Model Selection for Mixture Distributions

Mixture models, in which a probability distribution is represented as a linear superposition of component distributions, are widely used in statistical modeling and pattern recognition. One of the key tasks in the application of mixture models is the determination of a suitable number of components. Conventional approaches based on cross-validation are computationally expensive, are wasteful of...

متن کامل

Robust Cross-Validation of Linear Regression QSAR Models

A quantitative structure-activity relationship (QSAR) model is typically developed to predict the biochemical activity of untested compounds from the compounds' molecular structures. "The gold standard" of model validation is the blindfold prediction when the model's predictive power is assessed from how well the model predicts the activity values of compounds that were not considered in any wa...

متن کامل

Model Selection for Mixture Models Using Perfect Sample

We have considered a perfect sample method for model selection of finite mixture models with either known (fixed) or unknown number of components which can be applied in the most general setting with assumptions on the relation between the rival models and the true distribution. It is, both, one or neither to be well-specified or mis-specified, they may be nested or non-nested. We consider mixt...

متن کامل

Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation

BACKGROUND Generally, QSAR modelling requires both model selection and validation since there is no a priori knowledge about the optimal QSAR model. Prediction errors (PE) are frequently used to select and to assess the models under study. Reliable estimation of prediction errors is challenging - especially under model uncertainty - and requires independent test objects. These test objects must...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and modeling

دوره 48 2  شماره 

صفحات  -

تاریخ انتشار 2008